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Abstract 

In the practical business environment, portfolio managers often face business- 
driven requirements that limit the number of constituents in their tracking 
portfolio. A natural index tracking model is thus to minimize a tracking error 
measure while enforcing an upper bound on the number of assets in the port¬ 
folio. In this paper we consider such a cardinality-constrained index tracking 
model. In particular, we propose an efficient nonmonotone projected gradient 
(NPG) method for solving this problem. At each iteration, this method usually 
solves several projected gradient subproblems. We show that each subprob¬ 
lem has a closed-form solution, which can be computed in linear time. Under 
some suitable assumptions, we establish that any accumulation point of the 
sequence generated by the NPG method is a local minimizer of the cardinality- 
constrained index tracking problem. We also conduct empirical tests to com¬ 
pare our method with the hybrid evolutionary algorithm [ 28 ] and the hybrid 
half thresholding algorithm m for index tracking. The computational re¬ 
sults demonstrate that our approach generally produces sparse portfolios with 
smaller out-of-sample tracking error and higher consistency between in-sample 
and out-of-sample tracking errors. Moreover, our method outperforms the other 
two approaches in terms of speed. 
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1 Introduction 


Index tracking aims at replicating the performance and risk profile of a given market 
index, and constructs a tracking portfolio such that the performance of the portfolio is 
as close as possible to that of the market index. Index tracking problem has received 
a great deal of attention in the literature (see, for example, B El El El El EDI El US 
mmmmmmm)- An obvious approach is by full replication of the index. It, 
however, can cause high administrative and transaction costs. Also, in the practical 
business environment, portfolio managers often face business-driven requirements that 
limit the number of constituents in their tracking portfolio. Therefore, index tracking 
can reduce transaction costs and avoid detaining small and illiquid assets for the 
index with a large number of constituents. 

In this paper we consider a natural model for index tracking, which minimizes 
a quadratic tracking error while enforcing an upper bound on the number of assets 
in the portfolio. When short selling is not allowed, this model can be formulated 
mathematically as 


mm TE{x ) := \\y — Rx\\ 2 /T. 


( 1 . 1 ) 


Here, x G is the weight vector of n index constituents; y G is a sample vector of 
portfolio returns over a period of length T; R G 3ft Tx?l consists of the sample returns 
of index constituents over the same period, 


A!:= IGF 


YTi=i x i =!, IMIo < r 
0 < Xi < u, i = 1,..., n 


( 1 . 2 ) 


||x|| 0 denotes the number of nonzero entries of x\ and u G [1/r, 1] is an upper bound on 
the weight of each index constituent. The sum of error squares is used here to measure 
the tracking error between the returns of the index and the returns of a portfolio. We 
shall mention that another possible tracking error measure is the weighted sum of 
error squares. Recently, Gao and Li [IB] studied a related but different cardinality 
constrained portfolio selection model, which minimizes the variance of the portfolio 
subject to a given expected return and a cardinality restriction on the assets. They 
developed some efficient lower bounding schemes and proposed a branch-and-bound 
algorithm to solve the model. 

Index tracking problem (II. lj) involves a cardinality constraint and is generally NP- 
hard. It is thus highly challenging to find a global optimal solution to this problem. 
Recently, Fastrich et al. [16] studied a relaxation of (II.ip by replacing the cardinality 
constraint in (11.11) by imposing an upper bound on the / g -norm (0 < q < 1) [T2] of the 
vector of portfolio weights. Xu et al. [30] considered a special case of this relaxation 
with q — 1/2 and proposed a hybrid half thresholding algorithm for solving this li/ 2 
regularized index tracking model. Lately, Chen et al. m proposed a new relaxation 
of problem (11.11) . which minimizes the /g-norm regularized tracking error. They also 
proposed an interior point method to solve the model. On the other hand, a local 
optimal solution of (11.11) can be found by the penalty decomposition method and 


2 











the iterative hard thresholding method that were proposed in [22] [23], respectively. 
However, they are generic methods for a more general class of cardinality-constrained 
optimization problems. When applied to problem (11,11) . these methods may not 
be efficient since they cannot exploit the specific structure of the feasible region of 
problem (II.lj) . 

Nonmonotone projected gradient (NPG) methods have widely been studied in the 
literature, which incorporate the nonmonotone line search technique proposed in [20] 
into projected gradient methods. For example, Birgin et al. [7] studied the conver¬ 
gence of an NPG method for minimizing a smooth function over a closed convex set. 
Dai and Fletcher na studied a NPG method for solving a box-constrained quadratic 
programming in which Barzilai and Borwein’s scheme |1] is used to choose initial 
stepsize. Recently, Francisco and Bazan d! proposed an NPG method for minimiz¬ 
ing a smooth objective over a general nonconvex set and showed that it converges 
a generalized stationary point that is a fixed point of a certain proximal mapping. 
It is known that NPG methods generally outperform the classical (monotone) pro¬ 
jected gradient methods in terms of speed and/or solution quality (see, for example, 
[3 m 31 El])- I n this paper, we propose a simple NPG method for solving problem 
JED- At each iteration, our method usually solves several projected gradient sub¬ 
problems. By exploiting the specific structure of the feasible region of problem (11.11) . 
we show that each projected gradient subproblem has a closed-form solution, which 
can be computed in linear time. Moreover, we show that any accumulation point of 
the sequence generated by our method is an optimal solution of a related convex opti¬ 
mization problem. Under some suitable assumption, we further establish that such an 
accumulation point is a local minimizer of problem (11. ip . We also conduct empirical 
tests to compare our method with the other two approaches proposed in [581 ED] for 
index tracking. The computational results demonstrate that our approach generally 
produces sparse portfolios with smaller out-of-sample tracking error and higher con¬ 
sistency between in-sample and out-of-sample tracking errors. Moreover, our method 
outperforms the other two approaches in terms of speed. 

The rest of the paper is organized as follows. In section [5] we propose a non¬ 
monotone projected gradient method for solving a class of optimization problems 
that include problem (II.ip as a special case and establish its convergence. In section 
[3] we conduct empirical tests to compare our method with the other two existing 
approaches for index tracking. We present some concluding remarks in section [4] 

2 Nonmonotone projected gradient method 

In this section we propose a nonmonotone projected gradient (NPG) method for 
solving the problem 

( 2 . 1 ) 
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where A“ is defined in (j 1.2 [) and / : —> 9ft is Lipschitz continuously differentiable, 

that is, there is a constant Lf > 0 such that 

IIV/(a?) — V/(y)|| < L f ||x — y\\ Vx,j/Gft". (2.2) 

Throughout this paper, || ■ || denotes the standard Euclidean norm. It is clear to 
see that problem (12. ip includes (11.11) as a special case. Therefore, the NPG method 
proposed below can be suitably applied to solve problem (11.11) . 

Nonmonotone projected gradient (NPG) method for (12.ip 

Let 0 < L min < L max , t > 1, c > 0, integer M > 0 be given. Choose an arbitrary 
x° G A“ and set k = 0. 

1) Choose L k G [L min , L max ] arbitrarily. Set L k = L k . 

la) Solve the subproblem 

x k+1 G Arg min < V f(x k ) T (x — x k ) + —||x — x fc || 2 l (2.3) 

P 2 J 

lb) If 

f(x k+1 ) < max /(x 4 ) - -||a: fe+1 - x k \\ 2 (2.4) 

[k—M] + <i<k 2 

is satisfied, then go to step 2). 

lc) Set L k <— rL k and go to step la). 

2) Set k <— k + 1 and go to step 1). 

end 


Remark. 


(i) When M = 0, the sequence {f{x k )} is monotonically decreasing. Otherwise, 
it may increase at some iterations and thus the above method is generally a 
nonmonotone method. 


(ii) A popular choice of L k is by the following formula proposed by Barzilai and 
Borwein [4j (see also my. 


L k = max < L min , min < L max , 


( s k ) T y k 

|| c k || 2 


where s k = x k — x k x , y k = V f(x k ) — V f(x k x ). 


We first show that for each outer iteration of the above NPG method, the number 
of its inner iterations is finite. 
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Theorem 2.1 For each k > 0, the inner termination criterion (I2.4[) is satisfied after 
at most 


max 


log (L f + c) - log(L min) 

log T 


+ 1 


inner iterations. 


Proof. Let L k denote the final value of L k at the kth outer iteration, and let n k 
denote the number of inner iterations for the fctfi outer iteration. We divide the proof 
into two separate cases. 

Case 1): L k = L k . It is clear that n k = 1. 

Case 2): L k < L° k . Let H{x) denote the objective function of (12.31) . By the 
definition of x k+1 , we know that H(x k+1 ) < H(x k ), which implies that 

\7f(x k ) r {x k+1 - x k ) + ^-\\x k+1 - x k \\ 2 < 0. 

In addition, it follows from (12. 2 p that 


f{x k+1 ) < f(x k ) + \/f{x k Y (x k+l - x k ) + ^\\x k+l - x' 


k\T / fc+1 


J f imfc+i 


„fc M 2 


Combining these two inequalities, we obtain that 

f(x k+l ) < f(x k ) - Lk ~ Lf \\x k+1 


x 


k || 2 


Hence, (12.4(1 holds whenever L k > Lf + c. This together with the definition of L k 
implies that L k /r < Lf + c, that is, L k < r(Lf + c). In view of the definition of n k , 
we further have 


L m inr nk - 1 < L k T nk ~ 1 = L k < r(L f + c). 


Hence, n k < 


log(L / +c)-Iog(L min ) 

log T 


+ 1 


Combining the above two cases, we see that the conclusion holds. 


We next establish convergence of the outer iterations of the NPG method. 

Theorem 2.2 Let {x fc } be the sequence generated by the above NPG method. There 
hold: 

(1) {f(x k )} converges and (||x fc — —* 0. 

(2) Let x* be an arbitrary accumulation point of {x k } and J* = {j : x* ^ 0}. Then 
x* is a stationary point of the problem 

min / (x) 

s -k Y,i=i x i — Ij 0 < Xj < u, j G J *; (2-5) 

Xj = 0, j f J*. 

Suppose further that f is convex. Then 
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(2a) x* is a local minimizer of problem (12.ip if ||x*||o = r; 
(2b) x* is a minimizer of problem (12.51) if ||x*|| 0 < r. 


Proof (1) Notice that / is continuous in A = {x G 3? n : Y^i=i = 1, 0 < Xi < 
u Vi}. Since {x k } C A, it follows that {f{x k )} is bounded below. Let £(k) be an 
integer such that [k — M] + < £(k) < k and 


/(*'“>) 


max fix 1 ). 
[k—M]-\-<i<k 


It is not hard to observe from (12.41) that f{x^) is decreasing. Hence, lim^oo f{x = 
/ for some / € SR. Using this relation, (12.41) . and a similar induction argument as 
used in 129], one can show that for all j > 1, 

lirn d m ~ j = 0, lim f{x m ~ j ) = /, 

k —^oo k —^oo 


where d k = x k+l —x k for all k > 0. In view of these equalities, the uniform continuity 
of / over A, and a similar argument in [29], we can conclude that { f(x k )} converges 
and {\\x k — x fc-1 ||} —>• 0. 

(2) Let x* be an arbitrary accumulation point of {x fc }. Then there exists a sub¬ 
sequence /C such that {x k }/c —$■ x*, which together with \\x k — x fc_1 || —> 0 implies 
that { x A: ~ 1 }ac —* x*. By considering a convergent subsequence of /C if necessary, as¬ 
sume without loss of generality that there exists some index set J such that x k = 0 
for every j £ J,k E f C and x k > 0 for all j G ./, k G /C. Let L k denote the final 
value of L k at the £;th outer iteration. From the proof of Theorem 12.11 we know that 
L k G [L min , r(Lf + c)]. By the definition of x k , one can see that x k is a minimizer of 
the problem 

min \^7 f (x k ~ l ) T (x — £ fe_1 ) + —^~\\x — £ fc ~ 1 || 2 l . 

xgAj? } 2 J 

Using this fact and the definition of J, one can observe that x k is also the minimizer 
of the problem 

min | V/(x fc_1 ) T (x — x k ~ l ) + -^~\\x — x fc_1 || 2 | , (2.6) 

where 

B=Lr: = J. 0 < x, ■< u, j e J, 1 

l Xj = 0, J £ J. J 

By the first-order optimality conditions of (12.61) . we have 

- V/^- 1 ) - L k _i{x k - x k ~ l ) G Nv{x k ) Vk G JC, (2.7) 

where Afn{x) denotes the normal cone of 9 at x. Using L k -1 G [L min ,r(L/ + c)], 
{x k ~ l }ic —)■ x*, \\x k — 1 1| —> 0, outer continuity of Afn(-), and taking limits on both 

sides of (12.7[) as k G /C —> oo, one can obtain that 


— V/(x*) G Afn{x*). 


( 2 . 8 ) 
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Let Q be the feasible region of problem (12.5p . Clearly, J* C J and hence Q C CL which 
implies that J\ffax*) C J\ffax*). It then follows from (12. 8 p that —Vf(x*) G J\ffax*). 
Hence, x* is a stationary point of problem (12.5ft . 

We next prove statements (2a) and (2b) under the assumption that / is convex. 

(2a) Suppose that ||x*||o = r and / is convex. We will show that x* is a local 
minimizer of problem (12.ip . Let e = min{a;* : j G J*}, 

0(x *; e) = {x G 12 : ||x — x*|| < e}, 0(x*; e) = {x G A“ : ||x — x*|| < e}, 

where hi is defined above. Since / is convex and x* is a stationary point of (12.5p . one 
can conclude that x* is a minimizer of problem (12.5p . which implies that f(x) > f(x*) 
for all x G 0(x*]e). In addition, using the definition of e and |J*| = r, it is not 
hard to observe that 0(x*]e) = 0(x*;e). It then follows that f{x) > f(x*) for all 
x G 0(x*;e), which implies that x* is a local minimizer of problem (12.ip . 

(2b) Suppose that ||x*||o < r and / is convex. Recall from above that x* is a 
stationary point of (12.5p . Moreover, notice that problem (12. 5 p becomes a convex 
optimization problem when / is convex. Therefore, the conclusion of this statement 
immediately follows. ■ 


One can observe that problem (12.3p is equivalent to 


x k+1 G Arg min 

xgA“ 


x - ( x k - — V/(x A 
Lk 


which is a special case of a more general problem 


mm kc — a\ 

iGA" 


(2,9) 


for some a G 9ft n . In the remainder of this section we will show that problem (12.9p 
has a closed-form solution, and moreover, it can be found in linear time. Before 
proceeding, we review a technical lemma established in [22]. 


Lemma 2.1 Let A* C 9ft and fa : 9ft — > 9ft for i = 1, ... ,n be given. Suppose that 
r is a positive integer and 0 G X % for all i. Consider the following Iq minimization 
problem: 

min < fax) = S ^<f>i(x i ) : ||x|| 0 < r, x G X\ x ■ ■ ■ x X n > . (2.10) 


i= 1 


Let x* G Arg min {fa(xi) : Xi G Aj} and I* C {1,..., n} be the index set corresponding 
to the r largest values of {v*}f =l , where v* = fa(0) — fa(x*) for i — 1,..., n. Then x* 
is an optimal solution of problem (12.10p . where x* is defined as follows: 


r x* if i G /*; 

0 otherwise , 
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We are now ready to establish that problem (12.91) has a closed-form solution that 
can be computed efficiently. 


Theorem 2.3 Given any a G let I* C {1, ...,n} be the index set corresponding 
to the r largest values of {ai}f =1 . Suppose that X* G 9ft is such that 

£lWa, + A*) = l, (2-11) 

iei* 


where 


[0 ift< 0; 

II[o, u ]0) = < t if0<t<u ; Vi G 9ft. 
I u if t > u 


Then x* is an optimal solution of problem (12.91) . where x* is defined as follows: 


f n M (ai + X*) if i G /*; 

1 0 otherwise , 


Proof. Let d(x) and d* denote the objective function and the optimal value of (12. 9 j) . 
respectively, and x* be defined above. We can observe that ||x*||o < r, Y2i=i x i = 1 
and 0 < x* < u for all j , which implies that x* is a feasible solution of (12.9)1 . namely, 
x* G A“. Hence, d(x*) > d*. Let if(t) = i 2 — (i — n[ 0jU ](i)) 2 for every f G 9ft. It is not 
hard to see that if is differentiable, and moreover, 


if'(t) = 2i - 2(t - n [0iU ](t)) = 2n [0iU] (i) > 0. 

Hence, if it) is increasing in (— 00 , 00 ). Let 4>i{xf) = {xi — a, — A*) 2 , X t = [0,n], 
x* = arg min{0j(xj) : x, G X,} and v* = <fi(0) — <j>i(x*) for all i. One can observe 
that x* = n[ 0iU ] (ai + A*) and v* = if (at + A*) for all i. By the dehnition of I* and 
the monotonicity of if, we conclude that I* is the index set corresponding to the r 
largest values of {u*}” = i• In view of Lemma [2.II and the definitions of x* and x*, one 
can see that x* is an optimal solution to the problem 


d* = min 

0<x<n,||tc||o<r 


\x — a\ 


2A*£> i -m, 


i— 1 


and hence, 


d* = \\x* - a|| 2 - 2A*(^x* - 1) = ||x* - a|| 2 = d(x*). 

i= 1 


In addition, we can observe that d* > d*. It then follows that d* > d(x*). Recall that 
d(x*) > d*. Hence, we have d(x*) = d*. Using this relation and x* G A“, we conclude 
that x* is an optimal solution of problem (12.91) . ■ 


We next show that a A* satisfying (I2.11j) can be computed in linear time, which 
together with Theorem 12.31 implies that problem (j2.9[) can be solved in linear time as 

well. 















Theorem 2.4 For any a G and u> 1/n, the equation 

n 

h(X) II[o )U ](aj + A) — 1 = 0. (2-12) 

i= 1 

has at least a root X*, and moreover, it can be computed in 0(n ) time. 

Proof. One can observe that h is continuous in (—oo, oo), and moreover, h( A) = —1 
when A is sufficiently small and h( A) = nu— 1 >0 when A is sufficiently large. Hence, 
(12.121) has at least a root A*. 

We next show that a root A* to (12.121) can be computed in 0[n) time. Indeed, 
it is not hard to observe that h is a piecewise linear increasing function in (—oo, oo) 
with breakpoints {— ay,... , — a n , — a\ +«,..., — a n + u}. Suppose that only k of these 
breakpoints are distinct and they are arranged in strictly increasing order {Ai < 

... < Afc}. The value of h at each A* and the slope of each piece of h can be evaluated 
iteratively. Indeed, let A 0 = —oo. Observe that h( A) = —1 for all A < Ai. Hence, 
h has slope s 0 = 0 in (—oo, Ai] and h( Ai) = —1. Suppose that h has slope s,_i in 
(Aj_i, Aj], and that h(Xf) is already computed, and also that there are number of 
{—ai,..., — a n } and n* number of {— a\ + u,..., — a n + u} equal to A*. Then the slope 
of h in (A*, A i+1 ] is s t = Sj_i + m* - which yields h(X i+1 ) = h(Xi) + Si(A i+ i - A*) 
for i = 1,..., k — 1. Since h( Ai) = —1, h(Xk) = nu — 1 > 0 and h is increasing, there 
exists some 1 < j < k such that h(Xj) < 0 and h(Xj + i) > 0. If h(Xj + \) = 0, then 
A* = Aj+i is a root to (12.12[) . Otherwise, A* G (Aj,A J+ i) and h(X*) = 0. Using these 
facts and the relation h( A) = h(Xj) + Sj( X — A j) for A G (Aj, Aj + i), we can have 

A* = A j - h(Xj)/Sj. 

In addition, one can observe that the arithmetic operation cost of this root-finding 
procedure is 0{n). m 

3 Numerical results 

In this section, we conduct numerical experiments to compare the performance of the 
NPG method proposed in Section [2] with the hybrid evolutionary algorithm |2_8J and 
the hybrid half thresholding algorithm [30] for solving index tracking problems. It 
shall be mentioned that the NPG method solves the Iq constrained model (II.ID with 
u = 0.5 while the hybrid evolutionary algorithm solves a mixed integer programming 
model and the hybrid half thresholding algorithm [30] solves an / 1/2 regularized index 
tracking model. These three methods were coded in Matlab, and all computations 
were performed on a HP dx7408 PC (Intel core E4500 CPU, 2.2GHz,lGB RAM) with 
Matlab 7.9 (R2009b). 

The data sets used in our experiments are selected from the standard ones in 
OR-library [5] and the CSI 300 index from China Shanghai-Shenzhen stock market. 
For the standard data sets, weekly prices of the stocks from 1992 to 1997 of Hang 
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Seng (Hong Kong), DAX 100 (Germany), FTSE (Great Britain), Standard and Poor’s 
100 (USA), the Nikkei index (Japan), the Standard and Poor’s 500 (USA), Russell 
2000 (USA) and Russell 3000 (USA) are used. For CSI 300 index, the daily prices 
of 300 stocks from 2011 to 2013 in China stock market are considered. According 
to the sample scale, we divide the above data sets into two categories: small data 
sets including Hang Seng, DAX 100, FTSE , Standard and Poor’s 100, the Nikkei 
index; and large data sets including CSI 300, Standard and Poor’s 500, Russell 2000 
and Russell 3000. As in Torrubiano and Alberto [28], each data set is partitioned 
into two subsets: a training set and a testing set. The training set, called in-sample 
set, consists of the first half of the data and is used to compute the optimal index 
tracking portfolio. We also use the in-sample set and the formula for TE given in 
(II.ip to calculate the tracking error, which is called in-sample tracking error ( TEI ) of 
the portfolio. The testing set, called out-of-sample set, contains the rest of the data 
and is used to test the performance of the resulting optimal index tracking portfolio. 
In particular, we use the formula for TE in (11.11) with ( R,y ) replaced by the out-of- 
sample set to calculate the tracking error, which is called out-sample tracking error 
( TEO ) of the portfolio. In addition, we denote the true sparsity of the optimal output 
generated by each method by S trU e- 

For the NPG method, we set L m in = 1CT 8 , L max = 10 s , r = 2, c = 10~ 4 , and M = 3 
for small data sets and M = 5 for large data sets. For the hybrid half thresholding 
algorithm, the lower and upper bounds are chosen to be 0 and 0.5, respectively. We 
terminate these methods when the absolute change of the approximate solutions over 
two consecutive iterations is below 10 -6 or the maximum iteration is 10, 000. For the 
hybrid evolutionary algorithm, we set the lower bound to 0, the upper bound to 0.5, 
initial population size to 100, mutation probability to 1%, cross probability to 50%, 
and maximum iterations to 10, 000. In addition, we randomly choose a feasible point 
of problem (11.ip as a common initial point for these three methods. 

In order to measure the out-of-sample performance and the consistency between 
in-sample and out-of-sample, we introduce the following two criteria. 

• Consistency: The consistency between in-sample and out-of-sample tracking 
errors of a portfolio given by a method A is defined as 

Cons(A) = | TEI A - TEO A \, 

where TEI A and TEO A are the in-sample and out-of-sample tracking errors of 
a portfolio generated by the method A. Clearly, the smaller value of Cons (A) 
means that the portfolio by A has more consistency between in-sample and 
out-of-sample tracking errors and thus it is more robust (or less sensitive) with 
respect to the sample data used for model (II.ip . 

• Superiority of out-of-sample: We define 

SupO(A, B) = TEO *~o b E ° A x 100%, 
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where TEOa and TEOb are out-of-sample tracking error of the portfolio by 
methods A and B , respectively. One can see that if SupO(A, B) > 0, TEOa 
is smaller than TEOb-, he., the portfolio by method A is superior to that by 
method B in terms of out-of-sample tracking error; and it is very likely that 
the portfolio by A has a smaller expected tracking error and thus it is a better 
estimation to the underlying statistical regression model. 

3.1 Results on small data sets 

In this subsection, we compare the performance of the NPG method with the hybrid 
evolutionary algorithm [28] and the hybrid half thresholding algorithm [30] on five 
small data sets, which are Hang Seng, DAX 100, FTSE, Standard and Poor’s 100, and 
Nikkei 225. For convenience of presentation, we abbreviate these three approaches as 
Z 0 , MIP and Zi/ 2 since they are the methods for Z 0 , MIP and Z 1/2 models, respectively. 
In order to compare fairly the performance of these methods, we tailor their model 
parameters so that the resulting portfolios have same density (i.e., same number of 
nonzero entries). 

Numerical results are presented in Tables 1 and 2, where N denotes the number 
of assets in a data set. I 11 particular, we report in Table 1 in-sample error and out-of 
sample error of the portfolios generated by the aforementioned three methods. In 
Table 2, we report the consistency between in-sample and out-of-sample errors, and 
the superiority of out-of-sample errors for the portfolios generated by these methods. 
The number of nonzero portfolios given by these methods is listed in the column 
named “density”. From Table 2, we can make the following observations. 

(i) The Zo-based method (i.e., NPG method) generally has higher consistency be¬ 
tween in-sample error and out-of-sample error than the MIP- and / 1 / 2 -based 
methods (namely, hybrid evolutionary and half thresholding algorithms) since 
Consilo) < Cons(MIP) holds for 100% (30/30) instances and Cons{lo) < 
Cons(li/ 2 ) holds for 77.3% (22/30) instances. 

(ii) The / 0 -based method is generally superior to the MIP- and Zi/ 2 -based methods 
in terms of out-of-sample error since SupO(l 0 , MIP ) > 0 holds for 90% (27/30) 
instances and SupO(l 0 ,li/ 2 ) > 0 holds for 93.3% (28/30) instances. 

3.2 Results on large data sets 

In this subsection, we compare the performance of the Zo-based method (i.e., NPG 
method) with the MIP- and Zi/ 2 -based methods (namely, hybrid evolutionary and 
half thresholding algorithms) on four large data sets, which are Standard and Poor’s 
100, Russell 2000, Russell 3000 and the Chinese index CSI 300. For a fair comparison 
of the performance of these methods, we tailor their model parameters so that the 
resulting portfolios have same density (i.e., same number of nonzero entries). 
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Table 1: The in-sample and out-of-sample tracking errors on small data sets. 


Index 

Density 

TEI 

lo 

TEO 

Strue 

TEI 

MIP 

TEO 

Strue 

TEI 

h/2 

TEO 

Strue 

Hang 

5 

6.23e-5 

5.17e-5 

5 

5.69e-5 

8.87e-5 

5 

8.36e-5 

7.07e-5 

5 

Seng 

6 

4.29e-5 

3.45e-5 

6 

4.85e-5 

7.82e-5 

6 

8.58e-5 

7.19e-5 

6 

(JV=31) 

7 

2.37e-5 

3.83e-5 

7 

3.26e-5 

5.38e-5 

7 

6.45e-5 

4.59e-5 

7 


8 

2.38e-5 

2.50e-5 

8 

2.06e-5 

3.09e-5 

8 

3.20e-5 

2.95e-5 

8 


9 

2.00e-5 

2.16e-5 

9 

1.95e-5 

2.80e-5 

9 

3.96e-5 

2.44e-5 

9 


10 

1.58e-5 

1.55e-5 

10 

1.86e-5 

2.77e-5 

10 

2.33e-5 

2.34e-5 

10 

DAX 

5 

4.10e-5 

1.08e-4 

5 

2.21e-5 

1.02e-4 

5 

4.88e-5 

1.18e-4 

5 

(JV=85) 

6 

3.07e-5 

1.00e-4 

6 

1.82e-5 

9.43e-5 

6 

3.86e-5 

1.13e-4 

6 


7 

2.56e-5 

9.68e-5 

7 

1.47e-5 

1.02e-4 

7 

2.47e-5 

1.04e-4 

7 


8 

1.68e-5 

8.71e-5 

8 

1.48e-5 

8.78e-5 

8 

2.66e-5 

9.36e-5 

8 


9 

1.54e-5 

8.23e-5 

9 

1.05e-5 

8.63e-5 

9 

3.44e-5 

9.72e-5 

9 


10 

1.88e-5 

8.11e-5 

10 

8.21e-6 

7.76e-5 

10 

2.23e-5 

1.03e-4 

10 

FTSE 

5 

1.05e-4 

8.43e-5 

5 

6.92e-5 

9.87e-5 

5 

1.22e-4 

8.80e-5 

5 

(Af=89) 

6 

7.29e-5 

8.74e-5 

6 

5.50e-5 

9.14e-5 

6 

1.04e-4 

8.78e-5 

6 


7 

6.83e-5 

8.18e-5 

7 

4.15e-5 

1.02e-4 

7 

6.70e-5 

9.67e-5 

7 


8 

5.81e-5 

6.00e-5 

8 

3.50e-5 

7.44e-5 

8 

6.11e-5 

7.10e-5 

8 


9 

6.51e-5 

5.67e-5 

9 

2.49e-5 

8.59e-5 

9 

7.08e-5 

5.72e-5 

9 


10 

6.70e-5 

6.94e-5 

10 

2.18e-5 

8.01e-5 

10 

5.43e-5 

7.27e-5 

10 

S&P 

5 

8.74e-5 

8.94e-5 

5 

4.50e-5 

1.14e-4 

5 

1.02e-4 

1.14e-4 

5 

(JV=98) 

6 

5.87e-5 

8.47e-5 

6 

3.37e-5 

1.01e-4 

6 

7.93e-5 

8.88e-5 

6 


7 

3.51e-5 

7.69e-5 

7 

3.36e-5 

8.93e-5 

7 

6.70e-5 

7.58e-5 

7 


8 

5.50e-5 

5.75e-5 

8 

2.51e-5 

7.35e-5 

8 

6.41e-5 

6.58e-5 

8 


9 

3.71e-5 

5.09e-5 

9 

2.11e-5 

5.92e-5 

9 

5.78e-5 

6.56e-5 

9 


10 

2.93e-5 

4.57e-5 

10 

1.85e-5 

5.10e-5 

10 

5.22e-5 

5.07e-5 

10 

Nikkei 

5 

1.34e-4 

1.32e-4 

5 

6.02e-5 

1.44e-4 

5 

1.22e-4 

1.43e-4 

5 

(N=225) 

6 

9.48e-5 

9.92e-5 

6 

5.13e-5 

1.20e-4 

6 

8.26e-5 

9.71c-5 

6 


7 

7.72e-5 

9.77e-5 

7 

3.93e-5 

l.lle-4 

7 

6.89e-5 

l.lle-4 

7 


8 

9.24e-5 

8.70e-5 

8 

3.12e-5 

1.18e-4 

8 

7.09e-5 

9.09e-5 

8 


9 

4.87e-5 

7.68e-5 

9 

2.78e-5 

1.18e-4 

9 

4.52e-5 

8.22e-5 

9 


10 

6.39e-5 

6.75e-5 

10 

2.36e-5 

8.25e-5 

10 

5.37e-5 

6.77e-5 

10 


Numerical results are reported in Tables 3 and 4, where N denotes the number of 
assets in a data set. In particular, we present in Table 3 in-sample error and out-of 
sample error of the portfolios generated by the above three methods. In Table 4, we 
present the CPU time of these methods and superiority of out-of-sample errors of 
the portfolios given by these methods. The number of nonzero portfolios given by 
these methods is listed in the column named “density”. We can have the following 
observations from Table 4. 

(i) The /o-based method (i.e., NPG method) generally has higher consistency be¬ 
tween in-sample error and out-of-sample error than the MIP- and / 1 / 2 -based 
methods (namely, hybrid evolutionary and half thresholding algorithms) since 
Cons(l 0 ) < Cons(MIP) holds for 100% (28/28) instances and Cons(l 0 ) < 
Cons{l\/ 2 ) holds for 89.3% (25/28) instances. 

(ii) The Zo-based method is generally superior to the MIP- and Zi/ 2 -based methods 
in terms of out-of-sample error since SupO(lo , MIP) > 0 holds for all instances 
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Table 2: The comparison on small data sets. 


Index 

Density 

Cons(lo) 

Cons(MIP) 

Cons(l 1/2 ) 

SupO(l 0 ,MIP) 

SupO(l 0 ,l 1/2 ) 

Hang 

5 

1.05e-5 

3.18e-5 

1.29e-5 

41.7 

26.8 

Seng 

6 

8.37e-6 

2.97e-5 

1.39e-5 

55.9 

52.1 

pV=31) 

7 

1.46e-5 

2.13e-5 

1.86e-5 

28.8 

16.4 


8 

1.23e-6 

1.03e-5 

2.43e-6 

19.0 

15.3 


9 

1 .66e-6 

8.50e-6 

1.52e-5 

22.9 

11.4 


10 

3.54e-7 

9.15e-6 

8.50e-8 

44.3 

33.9 

DAX 

5 

6.72e-5 

7.97e-5 

6.94e-5 

-6.28 

8.47 

(Af=85) 

6 

6.95e-5 

7.61e-5 

7.49e-5 

-6.27 

11.7 


7 

7.12e-5 

8.69e-5 

7.96e-5 

4.72 

7.26 


8 

7.03e-5 

7.30e-5 

6.70e-5 

0.79 

6.96 


9 

6.69e-5 

7.58e-5 

6.28e-5 

4.68 

15.3 


10 

6.23e-5 

6.94e-5 

8.11e-5 

-4.52 

21.6 

FTSE 

5 

2.11e-5 

2.95e-5 

3.40e-5 

14.6 

4.27 

(N=89) 

6 

1.45e-5 

3.64e-5 

1.66e-5 

4.41 

0.42 


7 

1.35e-5 

6.05e-5 

2.98e-5 

19.8 

15.4 


8 

1.85e-6 

3.94e-5 

9.95e-6 

19.3 

15.5 


9 

8.39e-6 

6.11e-5 

1.36e-5 

34.0 

0.74 


10 

2.46e-6 

5.83e-5 

1.85e-5 

13.3 

4.52 

S&P 

5 

2 .10e-6 

6.93e-5 

1.17e-5 

21.7 

21.3 

(Af=98) 

6 

2.60e-5 

6.70e-5 

9.48e-6 

15.9 

4.66 


7 

4.18e-5 

5.57e-5 

8.80e-6 

13.9 

-1.40 


8 

2.58e-6 

4.83e-5 

1.70e-6 

21.7 

12.6 


9 

1.38e-5 

3.81e-5 

7.81e-6 

14.0 

22.4 


10 

1.64e-5 

3.25e-5 

1.49e-6 

10.4 

9.96 

Nikkei 

5 

2 .10e-6 

8.39e-5 

2.14e-5 

8.28 

7.81 

(N= 225) 

6 

4.38e-6 

6.83e-5 

1.46e-5 

17.0 

-2.11 


7 

2.05e-5 

7.16e-5 

4.19e-5 

11.9 

11.8 


8 

5.40e-6 

8.64e-5 

2.00e-5 

26.1 

4.29 


9 

2.81e-5 

8.98e-5 

3.70e-5 

34.8 

6.60 


10 

3.60e-6 

5.89e-5 

1.39e-5 

18.1 

0.23 


and SupO(l 0 ,l 1/2 ) > 0 holds for 92.9% (26/28) instances. 

(iii) The /o-based method also generally outperforms the MIP- and Zi/ 2 -based meth¬ 
ods in terms of speed. 

4 Concluding remarks 

In this paper we proposed an index tracking model with budget, no-short selling 
and a cardinality constraint. Also, we developed an efficient nonmonotone projected 
gradient (NPG) method for solving this model. At each iteration, this method usually 
solves several projected gradient subproblems. We showed that each subproblem has 
a closed-form solution, which can be computed in linear time. Under some suitable 
assumptions, we showed that any accumulation point of the sequence generated by 
the NPG method is a local minimizer of the cardinality-constrained index tracking 
problem. We also conducted empirical tests on the data sets from OR-library [5j 
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Table 3: The in-sample and out-of-sample tracking errors on large data sets. 


Index 

Density 

TEI 

^0 

TEO 

Strue 

TEI 

MIP 

TEO 

Strue 

TEI 

h/2 

TEO 

Strue 


5 

3.34e-5 

2.19e-5 

5 

1.21e-5 

2.43e-5 

5 

2.39e-5 

1.99e-5 

5 

CSI 300 

6 

2.34e-5 

2.11e-5 

6 

1.17e-5 

2.37e-5 

6 

1.91e-5 

2.11e-5 

6 

(iV=300) 

7 

1.86e-5 

1.98e-5 

7 

7.84e-6 

2.36e-5 

7 

1.51e-5 

2.09e-5 

7 


8 

1.67e-5 

1.68e-5 

8 

7.68e-6 

2.04e-5 

8 

1.42e-5 

1.92e-5 

8 


9 

1.67e-5 

1.54e-5 

9 

7.23e-6 

1.85e-5 

9 

1.26e-5 

1.63e-5 

9 


10 

1.13e-5 

1.21e-5 

10 

6.42e-6 

1.51e-5 

10 

1.32e-5 

1.33e-5 

10 


20 

6.29e-6 

7.29e-6 

20 

2.92e-6 

7.65e-6 

20 

6.40e-6 

7.64e-6 

20 


30 

3.72e-6 

5.14e-6 

30 

2.07e-6 

5.20e-6 

30 

4.15e-6 

5.55e-6 

30 


40 

2.39e-6 

4.17e-6 

40 

1.58e-6 

7.63e-6 

40 

3.05e-6 

5.30e-5 

40 


50 

2.87e-6 

3.28e-6 

50 

1.90e-6 

5.00e-6 

50 

2.03e-6 

4.53e-6 

50 


80 

2.85e-6 

7.82e-5 

80 

2.65e-6 

9.98e-5 

80 

1.37e-5 

9.85e-5 

80 

S&P 

90 

2.43e-6 

7.52e-5 

90 

3.Ole-6 

1.24e-4 

90 

1.08e-5 

9.98e-5 

90 

(7V=457) 

100 

2.13e-6 

7.39e-5 

100 

2.50e-6 

9.69e-5 

100 

9.08e-6 

1.04e-4 

100 


120 

1 .66e-6 

7.59e-5 

120 

2.58e-5 

1.04e-4 

120 

6.42e-6 

9.35e-5 

120 


150 

1.52e-6 

7.95e-5 

150 

5.64e-6 

1.25e-4 

150 

5.18e-6 

1.07e-4 

150 


200 

1.57e-6 

7.94e-5 

200 

2.13e-6 

9.80e-5 

200 

2.72e-6 

9.09e-5 

200 


80 

4.02e-6 

2.07e-4 

80 

3.62e-6 

2.89e-4 

80 

2.92e-5 

2.34e-4 

80 

Russell 2000 

90 

3.51e-6 

2.08e-4 

90 

4.95e-6 

2.76e-4 

90 

2.76e-5 

2.45e-4 

90 

(JV=1318) 

100 

3.18e-6 

1.70e-4 

100 

2.61e-6 

2.60e-4 

100 

2.09e-5 

2.13e-4 

100 


120 

2.32e-6 

1.68e-4 

120 

2.80e-6 

2.49e-4 

120 

1.71e-5 

2.61e-4 

120 


150 

1.99e-6 

1.94e-4 

150 

1.16e-5 

2.68e-4 

150 

1.20e-5 

2.66e-4 

150 


200 

9.83e-7 

2.28e-4 

200 

1.42e-6 

3.31e-4 

200 

6.89e-6 

3.18e-4 

200 


80 

6.24e-6 

1.34e-4 

80 

3.90e-6 

1.70e-4 

80 

2.62e-5 

1.64e-4 

80 

Russell 3000 

90 

5.49e-6 

1.14e-4 

90 

3.33e-6 

1.21e-4 

90 

1.99e-5 

1.47e-4 

90 

(JV=2151) 

100 

4.10e-6 

1.05e-4 

100 

3.48e-6 

1.05e-4 

100 

1.87e-5 

1.37e-4 

100 


120 

2.78e-6 

9.82e-5 

120 

3.Ole-6 

1.06e-4 

120 

1.66e-5 

1.26e-4 

120 


150 

1.63e-6 

l.OOc-4 

150 

2.48e-6 

1.10e-4 

150 

1.46e-5 

1.23e-4 

150 


200 

1.41e-6 

1.06e-4 

200 

3.22e-6 

1.09e-4 

200 

1.03e-5 

1.57e-4 

200 


and the CSI 300 index from China Shanghai-Shenzhen stock market to compare our 
method with the hybrid evolutionary algorithm |28j and the hybrid half thresholding 
algorithm [30] for index tracking. The computational results demonstrate that our 
approach generally produces sparse portfolios with smaller out-of-sample tracking 
error and higher consistency between in-sample and out-of-sample tracking errors. 
Moreover, our method outperforms the other two approaches in terms of speed. 

We shall mention that the proposed NPG method in this paper can be used 
to solve the subproblems arising in the penalty method or augmented Lagrangian 
method when applied to solve more general problem 

min fix) 

s.t. g(x ) < 0, h{x) = 0 

for some g : 3? n —> and h : where is given in (II.2ft . 
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Table 4: The comparison on large data sets. 


Index 

Density 

lo 

Time 

MIP 

h/2 

Cons(lo) 

Cons(MIP) 

Cons(l 1/2 ) 

SupO 

Jo, MIP) 

SupO 
do, h/ 2 ) 


5 

0.0114 

26.7 

1.96 

4.05e-6 

1.22e-5 

5.65e-6 

18.2 

-5.24 

CSI 300 

6 

0.0113 

36.0 

2.17 

2.32e-6 

1.20e-5 

1.96e-6 

10.9 

-0.08 

(iV=300) 

7 

0.0039 

42.1 

2.31 

1 .22e-6 

1.58e-5 

5.83e-6 

16.3 

5.41 


8 

0.0097 

13.5 

2.18 

1.91e-7 

1.27e-5 

4.94e-6 

17.3 

12.1 


9 

0.0078 

17.1 

2.50 

1.35e-6 

1.12e-5 

3.66e-6 

16.7 

5.44 


10 

0.0053 

14.6 

2.71 

7.95e-7 

8.67e-6 

1.28e-7 

19.7 

8.72 


20 

0.0078 

2.84 

4.30 

1 .00e-6 

4.73e-6 

1.23e-6 

4.65 

4.49 


30 

0.0060 

1.97 

6.47 

1.42e-6 

3.13e-6 

1.40e-6 

1.21 

7.43 


40 

0.0064 

2.20 

6.85 

1.78e-7 

6.05e-6 

2.25e-6 

45.4 

21.4 


50 

0.0083 

1.76 

7.65 

4.10e-7 

3.11e-6 

2.50e-6 

34.5 

27.8 


80 

0.0271 

63.6 

8.64 

7.53e-5 

9.72e-5 

8.48e-5 

21.7 

20.7 

S&P 

90 

0.0207 

49.0 

10.2 

7.28e-5 

1.21e-4 

8.90e-5 

39.1 

24.6 

(1V=457) 

100 

0.0199 

77.0 

15.3 

7.17e-5 

9.44e-5 

9.47e-5 

23.7 

28.8 


120 

0.0187 

86.9 

13.3 

7.42e-5 

1.02e-4 

8.71e-5 

27.3 

18.8 


150 

0.0184 

58.7 

13.5 

7.80e-5 

1.20e-4 

1.Ole-4 

36.6 

25.4 


200 

0.0197 

689.3 

13.7 

7.78e-5 

9.58e-5 

8.82e-5 

19.0 

12.7 


80 

0.153 

577.7 

35.7 

2.03e-4 

2.85e-4 

2.05e-4 

28.3 

11.6 

Russell 2000 

90 

0.137 

352.6 

27.5 

2.04e-4 

2.71e-4 

2.17e-4 

24.7 

15.0 

(1V=1318) 

100 

0.148 

657.8 

38.4 

1.67e-4 

2.58e-4 

1.92e-4 

34.6 

20.1 


120 

0.149 

449.1 

47.2 

1.65e-4 

2.46e-4 

2.44e-4 

32.6 

35.6 


150 

0.113 

50.6 

56.5 

1.92e-4 

2.56e-4 

2.54e-4 

27.6 

27.3 


200 

0.095 

1352.7 

46.4 

2.27e-4 

3.29e-4 

3.11e-4 

30.9 

28.2 


80 

0.626 

861.1 

37.1 

1.28e-4 

1.66e-4 

1.38e-4 

21.0 

18.6 

Russell 3000 

90 

0.267 

1039.5 

47.9 

1.08e-4 

1.18e-4 

1.27e-4 

6.00 

22.3 

(JV=2151) 

100 

0.269 

913.1 

48.5 

1.Ole-4 

1.02e-4 

1.19e-4 

0.05 

23.5 


120 

0.248 

658.7 

88.0 

9.54e-5 

1.03e-4 

1.09e-4 

7.26 

21.8 


150 

0.216 

878.7 

74.9 

9.83e-5 

1.08e-4 

1.09e-4 

9.34 

18.9 


200 

0.342 

1999.9 

97.9 

1.05e-4 

1.05e-4 

1.47e-4 

2.30 

32.4 
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